Current Protein and Peptide Science, 2012, 13, 55-75 



55 



How Random are Intrinsically Disordered Proteins? A Small Angle 
Scattering Perspective 

Veronique Receveur-Brechot' * and Dominique Durand^'* 

'IMR-CNRS - UPR3243, 31, Chemin Joseph Aigiiier, 13402 Marseille Cedex 20, France; ^IBBMC, CNRS UMR 8619, 
Universite Paris-Slid, 91405 Orsay, France 

Abstract: While the crucial role of intrinsically disordered proteins (IDPs) in the cell cycle is now recognized, decipher- 
ing their molecular mode of action at the structural level still remains highly challenging and requires a combination of 
many biophysical approaches. Among them, small angle X-ray scattering (SAXS) has been extremely successful in the 
last decade and has become an indispensable technique for addressing many of the fundamental questions regarding the 
activities of IDPs. After introducing some experimental issues specific to IDPs and in relation to the latest technical de- 
velopments, this article presents the interest of the theory of polymer physics to evaluate the flexibility of fully disordered 
proteins. The different strategies to obtain 3-dimensional models of IDPs, free in solution and associated in a complex, are 
then reviewed. Indeed, recent computational advances have made it possible to readily extract maximum infomation from 
the scattering curve with a special emphasis on highly flexible systems, such as multidomain proteins and IDPs. Further- 
more, integrated computational approaches now enable the generation of ensembles of conformers to translate the unique 
flexible characteristics of IDPs by taking into consideration the constraints of more and more various complementary ex- 
periment. In particular, a combination of SAXS with high-resolution techniques, such as x-ray crystallography and NMR, 
allows us to provide reliable models and to gain unique structural insights about the protein over multiple structural scales. 
The latest neutron scattering experiments also promise new advances in the study of the conformational changes of mac- 
romolecules involving more complex systems. 
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1. INTRODUCTION 

Intrinsically disordered proteins (IDPs) are currently in 
the limelight of the most recent and exciting structure- 
function relationship studies. These proteins have over- 
thrown the long-lived idea that a definite 3D structure of a 
protein dictates its function [1,2]. Far from being the excep- 
tion that proves the rule, they have revealed to be extremely 
abundant in the cell, especially in eukaryotes, and have been 
shown to fulfill numerous essential functions in the cellular 
cycle [3, 4]. Most of them participate in intricate interaction 
networks and are implicated in molecular recognition proc- 
esses with multiple partners [5, 6], sometimes through an 
induced folding mechanism, in which the disordered protein 
gains secondary structural elements upon binding to its part- 
ner [7]. Because these proteins have been shown to be at the 
crossroads of many disease-related signaling pathways, they 
are considered a rich and unexplored reservoir of original 
targets for new drug design strategies based on protein- 
protein interactions [8-10]. Therefore, while understanding 
the molecular background of their function is crucial, linking 
their structural properties to their function is still very chal- 
lenging because of the lack of rigid regular structure. 
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Attempting to elucidate their structure-function specificities 
is extremely difficult, sometimes impossible, using a single 
classical structural method, such as x-ray diffraction or Nu- 
clear Magnetic Resonance (NMR) [11]. Only a strategic 
combination of complementary structural and biophysical 
techniques would allow one to decipher their mode of action 
at the structural level [12]. Among these techniques, small 
angle x-ray and neutron scattering (SAXS and SANS) have 
become increasingly valuable and effective and are particu- 
larly well adapted to the study of such proteins. The recent 
extraordinary success of small angle scattering techniques 
has been possible due to the growing number of programs 
and algorithms that have been developed in recent years that 
exploit all the structural information contained in the data on 
the conformation of proteins [13, 14]. Furthermore, espe- 
cially in the case of proteins containing disordered domains 
or bound to a structured partner, small angle scattering tech- 
niques are extremely powerful when used in combination 
with other methods, in particular with high-resolution meth- 
ods, such as NMR or x-ray crystallography. 

SAXS had long been used in protein folding studies, 
benefiting from earlier studies in polymer chemistry [15]. In 
particular, it was one of the very few techniques that could 
characterize the denatured state of proteins, which represents 
the other half of the folding equation [16, 17], and the identi- 
fied folding intermediates, in particular the molten and the 
premolten globules [18]. It is therefore logical that some of 
the pioneering studies on IDPs used SAXS to demonstrate 
the disordered nature of a native protein [19, 20]. SAXS is 
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also the method of choice for the study of proteins that can- 
not crystallize, which is typically the case for IDPs [21, 22]. 
It is currently the only available structural method for the 
study of large flexible proteins [13, 14, 23]. 

The term IDP actually covers a wide variety of objects, 
as pointed out by Dyson & Wright [24], ranging from fully 
disordered proteins, similar to random polymers, to multi- 
domain proteins containing only long or short disordered 
regions along with very well ordered domains, and even to 
molten globules that possess all or most of their secondary 
structure. As mentioned above, many intrinsically disordered 
regions may also be involved in molecular recognition and in 
the formation of macromolecular complexes. The strategy of 
how to analyze SAXS data is chosen according to the type of 
object under study, the biological questions being addressed, 
and the kind of complementary structural and biophysical 
data that are also available. In particular, SAXS allows one 
to (i) decipher the molecular dimensions of a protein in solu- 
tion, (ii) infer the low-resolution shape of a protein or com- 
plex in solution, (iii) determine the structural arrangement of 
multidomain proteins, and (iv) assess the flexibility of a 
polypeptide chain through the distribution of conformations 
it can attain. When combined with high-resolution methods, 
the structural features of the protein or complex may then be 
described with more detail, allowing a deeper understanding 
of its molecular characteristics. Furthermore, the most recent 
advances in computational analyses now make it possible to 
take full advantage of combining SAXS with incredibly nu- 
merous techniques, especially through the wide variety of 
information provided by NMR, and to generate large ensem- 
bles of meaningful configurations [25-32]. Consequently, 
SAXS is now able to provide a comprehensive picture of the 
dynamic behavior of IDPs and their structural properties 
both free in solution and bound to a partner. The most recent 
successes using SAXS attest to its ability to answer real bio- 
logical questions related to these enigmatic and fascinating 
disordered proteins and their functions. SAXS therefore will 
be at the forefront of the forthcoming structural studies of 
IDPs. 

In the present paper, we review the various strategies that 
can be employed to decipher the structural and dynamic fea- 
tures of IDPs using SAXS. The limits and pitfalls of data 
acquisition, data analysis, and interpretation of results will 
also be underlined, as SAXS is an inherently underdeter- 
mined method confronted to an ill-posed problem, and there- 
fore has few safeguards. Finally, the latest advances in the 
technique, especially in combination with cutting-edge com- 
putational and biophysical methods, will be presented, which 
will exemplify the fundamental questions regarding IDPs 
these studies could valuably address. 

2. PRACTICAL EXPERIMENTAL ISSUES 

A SAXS experiment measures the scattering intensity 
I{q) upon variation of the scattering angle 20 as a function of 

_ 47tsin6 

the scattermg vector q defined by <? — — \ — , where X is the 

wavelength of the radiation. According to Bragg's law, the 
corresponding distance in real space of the scattering vector 
q is given by d=2nlq. This scattering curve I{q) hence con- 



tains information in the reciprocal space on the structure of 
the object in solution at different distance scales typically 
ranging from ~10 to several hundred Angstroms. The maxi- 
mum size of the object that can be studied by SAXS is only 
limited by the smallest angles, or scattering vector q, that the 
instrument can attain for measuring the scattering intensity. 
This minimum scattering vector, q, in reciprocal space or the 
corresponding maximum measurable distance, in real space, 
is therefore the most important parameter of a SAXS ex- 
periment as it determines the maximum size of the object 
that can be studied. Similar to crystallography, the resolution 
of the experiment is defined by the smallest distance that can 
be attained and therefore by the maximum value of the scat- 
tering vector qmax of the experiment. Typically, a synchroti'on 
SAXS experiment allows a resolution of a dozen of Ang- 
stroms, i.e., qmax -0.5 A"'. However, contrary to X-ray dif- 
fraction, which defines the resolution by the minimum dis- 
tance that can be resolved between two separated objects, the 
notion of resolution in SAXS is more vague because of the 
orientational averaging of the proteins in solution and be- 
cause of the absence of any parameter, such as I/sigma(I) in 
crystallography, that would assess the signal-to-noise ratio. 
The maximum scattering vector q therefore does not yield 
the smallest distance allowing for the separation of two dis- 
tinct objects but only a distance below which details pro- 
vided by modeling are not significant. Noteworthy efforts are 
underway to extend the SAXS limits up to q ~2 A"', i.e., dis- 
tances of 3-4 A, by covering the wide angle X-ray scattering 
(WAXS) domain to better characterize the solvent surround- 
ing the protein [33] and to follow subtle conformational 
changes upon, for example, ligand binding [34-36]. 

The measured scattering profile at very low angles is 
highly sensitive to the presence of large objects, especially 
aggregates. Because this part of the scattering curve contains 
information on the dimensions of the protein, it is therefore 
crucial to be able to discriminate between the effect of ag- 
gregates and the effect of a wide variety of extended and 
collapsed conformations adopted by the IDP on the scatter- 
ing curve. The presence of aggregates in solution will trans- 
late into an increase in the scattering intensity at low q, pre- 
venting an accurate measurement of the radius of gyration 
and maximum diameters. This issue is particularly crucial for 
IDPs because they possess large dimensions and the zone of 
application of Guinier law is reduced (see below). The ex- 
perimental accessible ^-range allowing one to determine the 
molecular dimensions of IDPs is therefore often very small. 
Consequently, highly monodispersed samples with no trace 
of aggregation are absolutely required. A solution may be 
provided by a size-exclusion FPLC or HPLC device con- 
nected upstream to the measurement capillary that separates 
the aggregates from the protein online. Such a set-up is now 
being proposed on several beamlines, such as at the SWING 
at SOLEIL synchrotron, near Paris, France [37], at the BL- 
lOC station at the Photon Factory, in Tsukuba, Japan [38, 
39], and the SAXS/WAXS beamline of the Austi-alian syn- 
chrotron in Melbourne, Austraha [40]. 

Another concern is the effect of intermolecular interac- 
tions in non-ideal solutions on the measured scattering inten- 
sity at low angles. The experimental scattering intensity is 
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expressed as: I(q)=F(q). S(q), where F(q) is the form factor 
of the scattering object, which contains all the information 
on the shape of the protein, and S(q) is the structure factor, 
which is related to interparticle interactions. The structure 
factor is equal to 1, S(q)=l, in the case of an ideal solution 
without any intermolecular interactions and tends toward 1 
only at medium and large q values in the case of real solu- 
tions. The scattering spectrum at low q is thus artificially 
decreased or increased in the presence of repulsive or attrac- 
tive interactions, respectively. Measurements at different 
protein concentrations and extrapolation to zero concentra- 
tion are therefore often required to eliminate the contribution 
of the structure factor on the measured scattering intensity at 
low angles. 

In any case, a careful inspection of the experimental 
value of the forward intensity 1(0) related to the molecular 
weight Mw of the protein (eq. 1) is required to detect the 
presence of aggregates or intermolecular interactions. 



(1) 



where c is the protein concentration in solution, cN° is 
Avogadro's number, pp and Ps are the scattering length den- 
sity of the protein and of the solvent, respectively, and Vp is 
the specific volume of the protein. SAXS is one of the very 
few methods that can directly determine the molecular 
weight of a macromolecule. In contrast to dynamic light 
scattering or size exclusion chromatography, for example, 
the measurement does not rely on the hydrodynamic proper- 
ties of the macromolecule or on any assumption about the 
shape of the protein. An accurate determination of the mo- 
lecular weight depends strongly on (i) the accuracy of the 
1(0) determination through the Guinier or the Debye law (see 
below) and is therefore very sensitive to the presence of ag- 
gregates in the solution and intermolecular interactions; (ii) 
the calibration of the measured data into the absolute scale 
(cm"'); calibration using pure water is preferable compared 
to a standard protein whose concentration and specific vol- 
ume will not be determined as precisely, because the scatter- 
ing intensity of water is precisely tabulated; (iii) the accuracy 
of the measurement of the protein concentration, which re- 
quires good knowledge of the extinction coefficient of the 
protein, while IDPs are often depleted of tryptophans; and 
(iv) the calculation of the specific volume Vp of the protein 
using, for example, the NucProt program [41], SEDNTERP 
(http://www.jphilo.mailway.com/default.htm and [42]) or 
other tables [43]; notably, unstructured proteins tend to have 
a lower specific volume than globular folded proteins, which 
often display pockets [44], giving rise to slightly lower 1(0). 

Typical scattering curves of IDPs are characterized by the 
absence of any specific feature, contrary to globular objects 
[45]. IDPs indeed exhibit many different conformations, 
which all display a different scattering profile. The resulting 
scattering curve is a combination of these numerous contri- 
butions and is therefore considerably smoothened upon aver- 
aging. Because of the absence of marked specific features on 
the scattering curve of IDPs, it is essential to collect data of 
the highest possible quality with good statistics, even at large 
q values, and accurate error bars because this plays a crucial 
role in the accurate determination of the distance distribution 



function and in the subsequent data analysis. The reader can 
refer to a recent review from Jacques and Trewhella [46] that 
provides a very useful set of guidelines for the 'good labora- 
tory practice' of a scattering experiment and for the critical 
evaluation of scattering data. 

Finally, considering the concentrations required for 
SAXS (mg/mL of protein in the beam) for a sufficient sig- 
nal-to-noise ratio allowing buffer subtraction and according 
to the law of mass action, reliable SAXS experiments per- 
formed on complexes involving several partners (protein, 
DNA, and others) require that the complex is of high affinity 
with dissociation constants (Kd) below the |aM range. This is 
an essential prerequisite to avoid measuring the signal aris- 
ing from an undefined mixture of the isolated partners in 
equilibrium with the complex. A solution can again be pro- 
vided by the use of an HPLC column upstream of the meas- 
urement capillary, provided that the complex is stable 
enough to not dissociate completely upon elution of the col- 
umn. 

3. HOW TO EVALUATE AN IDF BY SAXS 

The first characteristics that are usually inferred from scatter- 
ing data without requiring any modeling are (i) the radius of 
gyration, (ii) the distance distribution function P(r), which is 
the histogram of the distances within a protein, and (iii) the 
scattering profile in a Kratky plot (q"I(q) vs. q), which di- 
rectly reports on the compact or unstructured nature of the 
polypeptide chain. 

3.1. The Radius of Gyration 

The radius of gyration is the first parameter yielded by 
SAXS and provides information on the average size of the 
scattering object in solution. Because IDPs are prone to 
adopt large extended conformations, the radius of gyration is 
a particularly relevant parameter to evaluate an IDP using 
SAXS. The radius of gyration, Rg, is defined by the root 
mean square of the radii, r, in the volume, v, of the protein 
and is given by the following equation: 



R- 



J(p(F)-pJrdV 

_v 

j(p(F)-pJdV 



(2) 



where p(r) is the scattering length density. The radius of gy- 
ration is generally determined using the Guinier Law: 



/(^) = /(0)exp 



(3) 



The Guinier law can theoretically be applied to any parti- 
cle whatever its shape. The radius of gyration is inferred 
from the slope of the Guinier plot, which represents the loga- 
rithm of the scattering intensity versus q^. A Guinier plot is 
only linear over a restricted region of the scattering spec- 
trum: qRg < 1.0. This region may sometimes be extended to 
^Rg < 1 .3 for well-folded proteins, but for an IDP, the region 
is actually reduced to ^Rg<0.8 and is sometimes even smaller 
[13, 47] because of the multiple sizes adopted by an IDP. 
Therefore, there may only be a few usable points in the ex- 
perimental spectrum, thus limiting the accuracy of the meas- 
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ured Rg. The Debye law offers an interesting alternative to 
determine the radius of gyration of fully denatured or very 
disordered proteins. It describes the behavior of a Gaussian 
chain and can be applied to a wider region of the scattering 
spectrum, up to ^Rg < 3, for a polymeric random coil [48], 
although a narrower q-range, up to q < 1.4, provides a much 
more reliable Rg for unstructured proteins [49]. The Debye 
law is given by the following expression: 



(4) 



'^ = ^(x-l-0, 
7(0) 

where x = {qRgf. 

The experimental radius of gyration of an IDP can be 
compared to theoretical or experimental values published for 
a globular and an unfolded protein of the same number of 
residues to quantitatively assess the extended nature of the 
IDP and to estimate whether the protein behaves as a random 
coil or whether it is more compact due to putative residual 
structure. Random coils usually refer to highly unfolded or 
disordered proteins with no or almost no structural elements 
with the vast majority of residues solvent exposed. 

Several systematic studies based on Flory's theory of 
polymer physics [48] may be used as references to determine 
how far the IDP differs from a random coil. Flory showed 
that the radius of gyration follows a power law, 

i?g = ^0-^^ ' where N is the number of monomers of the 

polymer, Rq is a constant, and v depends on the structural 
behavior of the polymer chain in the solvent. Thus, the theo- 
retical value of V for a spherical compact globule is 1/3, 
whereas the predicted value of v is 0.5 for a random coil and 
0.588 for a polymer in "good solvent" or an excluded vol- 
ume chain [48]. In an excluded volume chain, the interac- 
tions are dominated by steric repulsions between the mono- 
mers of the chain, as in the case of the amino acids in a poly- 
peptide chain in the presence of strong denaturant concentra- 
tions. Plaxco and co-workers compiled results in the litera- 
ture on a wide range of native globular proteins and on 
strongly denatured proteins [50-52]. They discovered that the 
experimentally-derived radii of gyration follow the scale law 
as a function of the number, N, of residues of the protein 
with V = 0.38 ± 0.05 for native globular proteins [52], which 
is close to the 1/3 predicted for a sphere, and v = 0.598 ± 
0.028 with Ro=1.93 for completely denatured proteins, which 
is close to the expected value of 0.588 for excluded volume 
chains [48]. WiUcins et al. obtained similar results on a 
smaller subset of native and denatured proteins, with Rq = 
V(3/5) X 4.75 and v = 0.29 for native proteins and with Rq = 
2.23 and v = 0.58 for proteins under strongly denaturing 
conditions [53]. In the case of an IDP in an aqueous buffer, 
Bemado and Blackledge infer lower values of Rg, with R© = 
2.54, and v = 0.522, which is closer to the expected exponent 
for a random coil [54]. They obtained this result by calculat- 
ing the scattering intensity of the ensemble of conformations 
adopted by polypeptide chains of N residues. This result was 
then consistent with a small set of experimentally derived Rg 
values of IDPs. However, it is recognized now that there 
probably exists a continuum between ordered and fully dis- 
ordered proteins, resulting from a wide diversity of se- 



quences. These predictions of Rg from the number of resi- 
dues of IDPs therefore constitute lower and upper bounds, 
and significant differences from the upper bound actually 
reveal global or local structural restraints, indicating how the 
IDP deviates from the random coil. Indeed, Plaxco and co- 
workers explored the effect of local or residual structures on 
the scaling behavior and dimensions of unstructured proteins 
and showed that residual helical structures contract the pro- 
tein, whereas PPII helices tend to increase the dimensions of 
the protein beyond the value expected for a random coil [55]. 

Finally, the radius of gyration of a protein can be com- 
pared to the hydrodynamic radius, Rh, determined by DLS or 
pulse-field gradient NMR (PFG-NMR). The hydrodynamic 
radius, or Stokes radius, is the radius of the equivalent sphere 
that diffuses with the same diffusion coefficient. The Rg/Rh 
ratio is (3/5)"~ for a globular protein and approximately 1.4 
for a denatured protein. Although not very infonnative, any 
intermediate value of this ratio ascertains the presence of 
more or less residual structure {molten globule and premol- 
ten globule). This was the approach of Uversky and coll., for 
example, who reported on the intrinsically disordered C- 
terminal domain of caldesmon [56] and on a-, [3- and y- 
synucleins [57]. 

3.2. The Distance Distribution Function P(r) 

The other dimensions of the protein can be accessed 
through the distance distribution function P(r), which is in- 
ferred by the Fourier transform of the scattering intensity 
F {l{q)) using the programs GNOM [58] or GIFT [59]: 



?{r) = V {I{q)) = \l{q)c-'~^'dq. 



(5) 



The P(r) function is a histogram of all the interatomic 
distances, r, within the protein. The maximal value of r for 
which P{r) is not equal to zero, Dmax, corresponds to the 
maximum diameter of the protein. This histogram and the 
value of Dmax contain valuable information on the shape, the 
anisotropy, and the degree of compactness of the protein. 

Typical P(r) functions of IDPs are very asymmetric com- 
pared to the highly symmetric P(r) function of globular pro- 
teins that lack any marked features or breaks, and end with a 
smooth concave curvature. They often display an extended 
tail due to the variety of extended conformations present in 
solution. The presence of aggregation in solution also trans- 
lates into a similar extended tail. The complete absence of 
aggregates in solution is therefore absolutely necessary in the 
case of IDPs to avoid misinterpreting the data. The P(r) func- 
tions of proteins containing several globular domains teth- 
ered by long disordered regions are characterized by peaks at 
low r values, corresponding to the intradomain distances, and 
a tail with more or less pronounced shoulders corresponding 
to the interdomain distances and depending on the flexibility 
of the linker (Fig. (1)). 

The radius of gyration and forward intensity 1(0) can also 
be inferred from the P(r) function according to equation (6). 



r: 



= ^ and 7(0) = 4;r£ P{r)&r 



Il-P{r)d 



(6) 
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Fig. (1). Experimental P(r) functions of multidomain proteins. Experimental P(r) functions of the Humicola insolens cellulase Cel45 and 
variants: globular catalytic domain (red curve), catalytic domain and linker (blue curve), full-length Cel45 wild-type (green curve), and full- 
length Cel45 with a proline mutation leading to a more rigid linker (black curve). The crystal structures of the catalytic domain (red) and the 
cellulose-binding domain (yellow) are represented in space-fdling mode. The enhanced rigidity of the linker in the mutant Cel45 translates 
into a P(r) function with a well separated peak corresponding to the interdomain distances. (Figure adapted from [60]). 



This alternative method to determine the radius of gyra- 
tion and 1(0) is interesting because it does not rely on any 
model (Debye or Guinier) and uses the entire scattering spec- 
trum. The Rg inferred from this equation often yields slightly 
larger values than with the Guinier law [45] mainly because 
the Guinier law is less appropriate to describe an unfolded 
chain and often underestimates the radius of gyration of ex- 
tended chains. It is therefore always interesting to compare 
the values of Rg obtained from these two methods. Finally, 
determining 1(0) through the P(r) function allows one to 
cross-check the values obtained by the different methods and 
to ascertain the quality of the data. 

It is worth noting that it may be interesting to confront 
the values of Rg and Dn,ax of an IDP. Whereas the radius of 
gyration is an average dimension of all the conformers in 
solution, the maximum diameter D„iax is inferred from the 
most extended conformations significantly present in solu- 
tion. Thus the flexibility of linkers in bimodular cellulases 
could be assessed by comparing these dimensions for differ- 
ent variants [60]. Cellulase Cel45 is composed of a catalytic 
domain and a small cellulose-binding domain whose struc- 
tures have been solved. The dimensions of the full-length 
protein allowed a direct inference of the maximum distance 
of the linker within the protein and demonstrated that the 
linker was very extended. In a variant of Cel45 in which two 
amino acids of the linker were replaced by two prolines re- 
sulting in a stretch of five consecutive proline residues, the 
maximum dimensions were the same as in the wild-type pro- 
tein, whereas the radius of gyration, and thus the average 
dimensions of the variant with the polyproline stretch, were 
larger than those of the wild-type protein. The marked bi- 
modal distance distribution function of the variant compared 



to the smoother shoulder observed in the P(r) function of the 
wild type cellulase (Fig. (1)) also indicated that the most 
extended conformations were more abundant in the variant, 
whereas the wild-type protein was more flexible and could 
adopt both compact and extended conformations. The profile 
of the distance distribution function and the diinensions that 
it provides therefore reveals much information on the com- 
pactness, anisotropy, and flexibility of a protein. Clearly, 
only a thorough analysis of the scattering curve by using an 
ensemble of conformations (see below) provides quantitative 
information on the flexibility and distribution of conforma- 
tions that the protein may adopt. Nevertheless, examining the 
P(r) function, Dn,ax, and Rg provides rapid information on the 
nature of the linkers and on the different subpopulations 
without any assumptions and thus guides the selection of a 
strategy for the further analysis of the scattering curves. 

3.3. The Kratky Plot 

The Kratky plot is an extremely useful representation of 
the scattering intensity to quickly assess the globular nature 
of a polypeptide chain without any modeling. The Kratky 
plot plots the scattering pattern as q^I{q) versus q. The scat- 
tering intensity I{q) of a globular protein with a well-defined, 
solvent-accessible surface follows the Porod law and de- 
creases as q''^ in the large q region. As a result, the corre- 
sponding Kratky plot exhibits a typical bell-shape with a 
well-defined maximum. Conversely, for a random chain, the 
scattering intensity has a limiting behavior of q'^ at high q, as 
indicated by the Debye law (Eq. 4). Therefore, the Kratky 
plot of a fully unfolded protein will exhibit a plateau in this q 
region, sometimes followed by an increase as q increases, 
depending on the local rigidity of the chain. Nevertheless, 



60 Current Protein and Peptide Science, 2012, Vol. IS, No. 1 



Receveur-Brechot and Durand 



this representation is not able to distinguish between fully 
folded and partially unfolded proteins containing structured 
regions of significant size, which also results in bell-shaped 
Kratky plots. To obviate this problem, Perez and co-workers 
highly recommend plotting a dimensionless Kratky plot [61], 
as is commonly done in other fields, such as polymer sci- 
ence. In this dimensionless Kratky plot, the intensity I{q) is 
normalized to the forward scattering intensity 7(0), and q is 
normalized to the radius of gyration of the protein. Multiply- 
ing q by the radius of gyration makes the angular scale inde- 
pendent of protein size, while l{q) divided by 1(0) becomes 
independent of the molecular weight of the protein as 1(0) is 
proportional to the molecular weight (Eq. 1). This normaliza- 
tion allows one to compare Kratky plots of globular and ex- 
tended proteins, whatever their size, and thereby to infer the 
maximum amount of information from this representation. 
The scattering pattern of a globular protein in a normalized 
Kratky plot exhibits a maximum value of 1.104 for qRg=V3, 
whatever the size of the protein. Conversely, for a random 
chain, the curve rises with increasing angle to reach a nearly 
flat region at a value between 1.5 and 2 followed at high q 
values (typically q > 0.2-0.3 A"') by a further increase de- 
pending on the rigidity of the polypeptide chain. Dimension- 
less Kratky plots of partly disordered proteins display dis- 
tinctive intermediate profiles between the two extremes (Fig. 
(2)). 
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4. ASSESSING THE FLEXIBILITY OF FULLY DIS- 
ORDERED PROTEINS WITH THE THEORY OF 
POLYMER PHYSICS 

Some IDPs are disordered along their entire sequence, 
whereas other so-called IDPs actually contain one or several 
long intrinsically disordered regions (IDRs) separated by 
globular domains with a definite function. If these IDRs re- 
main active when isolated from the rest of the protein, they 
constitute individual domains, and their structural and func- 
tion properties are often individually studied. Comparing the 
radii of gyration of these fully disordered proteins or do- 
mains with the expected Rg yielded by the empirical power 
law described above can provide information on the degree 
of structural disorder in the protein. However because it is a 
macroscopic parameter, the radius of gyration is not suffi- 
ciently sensitive to detect slight conformational restraints. 
Analyzing the entire scattering curve represents a step for- 
ward to infer and utilize all the quantitative information con- 
tained in the scattering spectrum. 

The theory of polymer solutions can be used to describe 
the behavior of highly unfolded or disordered polypeptide 
chains in solution with the worm-like chain model (WLC, 
also referred to as the Kratky-Porod chain model) [62]. The 
worm-like chain is a model chain with a persistence length 
that takes into account the local rigidity of the polypeptide 
chain. This rigidity accounts for the range of possible torsion 
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Fig. (2). Normalized Kratky plots. The scattering pattern of globular proteins in a normalized Kratky plot exhibits a bell-shaped profile with 
a clear maximum value of 1.104 for qRg=V3, regardless of the size of the protein, and are all nearly superimposable in the q range 0<qRg<3. 
Conversely, for a random chain, the curve rises with increasing angle, to nearly reach a plateau between 1 .5 and 2 and may further increase at 
q>0.2-0.3 A" , depending on the persistence length and the internal structure of the protein. Bell-shaped profile of a globular protein (PolX, 
blue line); curve of a protein consisting of several domains tethered by linkers with rather compact conformations (p47''''°'', dotted green line) 
or extended conformations (p67''''°'', continue red line); curve of a fiiUy disordered protein with very short elements of secondary structure 
(XPC dotted grey line); and curve of a fully disordered and extended protein with short segments of polyproline repeats (salivary protein 
IBS, continue purple line). 
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angles between two adjacent residues and also for putative 
residual structure. Two parameters describe this Kratky- 
Porod chain: the contour length, L, and the statistic length or 
Kuhn length, b, which is twice the persistence length. The 
persistence length is a measure of the stiffness of the poly- 
peptide and is defined as the length over which the polymer 
naturally stays straight. A higher persistence length indicates 
higher rigidity. This rigidity may be due to excluded volume 
interactions in the case of proteins denatured by chaotropic 
osmolytes, such as urea or guanidinium chloride, or to the 
presence of structural elements in the case of intrinsically 
disordered proteins in aqueous solutions. The contour length, 
L, is the length of the linearly extended chain without 
stretching the backbone. The scattering intensity follows this 
expression: 
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with X = q^Lb/6. This formula is valid for L/b >10, which 
means that the chain is long enough compared to the statistic 
length, and for q<'ilb. For a completely unfolded or disor- 
dered protein, such as a random coil, the value of the statisti- 
cal length b is expected to be -18-20 A [47]. Similarly, the 
theoretical contour length of a random coil is equal to N/^/, 
where N is the number of residues, Ig is the distance between 
two Ca (/o=3.78A), and/ is a geometrical factor that arises 
from the fact that an unfolded chain is not linear but zigzags 
and is equal to 0.95. A smaller contour length reveals the 
presence of local structures. Finally, the radius of gyration of 
a random coil can also be inferred from the values obtained 
for L and b according to the following relationship: 
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where y=Llb. Such analyses using the Kratky-Porod model 
were first performed on completely denatured globular pro- 
teins. It was thus shown that CheY denatured by 5 M guanid- 
inium chloride displays a significant rigidity all along the 
polypeptide chain due to the excluded volume generated by 
solvation of the chain by the denaturing osmolyte. b was 
equal to 28 A, and L was lower than the value expected for a 
random coil [63]. In contrast, heat-denatured neocarzi- 
nostatin (NCS) exhibits values of L and b close to those of a 
random coil [49]. These results can be compared to those 
obtained for IDPs. For example, the radius of gyration of the 
intrinsically disordered XPC is slightly lower than that ex- 
pected for a random coil according to equation (8), which is 
consistent with the existence of short elements of secondary 
structure observed by circular dichroism [64]. Conversely 
the Rg of disordered PIR domains better corresponds to a 
random coil [47]. Similarly, the contour length of Msh6- 
NTR, of 1078 A, compared to 1091 A for a random coil, and 
its statistical length of 18.7 A, which yields a persistence 
length of 9.35 A, coiTesponding to roughly three amino ac- 
ids, are consistent with a polypeptide chain adopting random 
conformations. The case of the proline-rich salivary proteins 
IB5 and Il-lng is more subtle than the previous examples 
[65]. The radius of gyration and the maximum diameter of 
these proteins are larger than those expected for a random 
coil, indicating that these proteins have strongly extended 



conformations. Conversely, the statistical lengths of these 
two proteins are of 29.7 and 29.9 A, respectively, revealing 
the existence of secondary structure elements. Similarly, 
their contour lengths, L, are significantly lower than that 
expected for a random coil, of 188 and 364 A, instead of the 
theoretical values of 251 and 503 A for IB5 and Il-lng, re- 
spectively, which is also consistent with secondary structure 
elements such as short PPII or PPI helical fragments. The 
high Rg value together with the higher statistical length and 
lower contour length reveal that these proteins are more ex- 
tended than a classical random coil because of PPI or PPII 
helical fragments that stretch the polypeptide chain. These 
examples illustrate that many insights on the structural re- 
straints in an IDP can be gained by analyzing the scattering 
curve with the theory of polymer physics. 

5. 3D MODELING OF IDPS USING SAXS 

The incredibly growing success of SAXS in the past few 
years has arisen from the latest advances in SAXS computa- 
tional data analysis and the possibility to yield more and 
more detailed 3D models of the macromolecule under study, 
even for IDPs. SAXS thereby became extremely powerful 
and could provide highly important clues on the structural 
and functional mechanisms of flexible systems with crucial 
biological roles, including IDPs [23]. However, SAXS is 
confronted with the ill-posed problem of infening a 3D 
structure from a ID scattering curve, leading to the crucial 
question of the uniqueness of the solution, as has been ad- 
dressed by Svergun and co-workers [66]. Consequently, the 
theoretical scattering curve of several different models may 
fit the experimental data with the same adequacy. This issue 
is even more acute for IDPs, which already exist as ensemble 
conformations in solution. The strategy to solve this ambigu- 
ity and to infer reliable models is to impose constraints on 
the reconstructions, implemented as much as possible by 
adding external information from complementary tech- 
niques. SAXS can therefore provide more detailed structural 
insights on IDPs when complemented by other structural 
techniques, especially NMR or X-ray crystallography, which 
provide the high-resolution information missing from the 
SAXS data. Hence, 3D-models that gather the information 
provided by SAXS and by these techniques, are ti'emen- 
dously helpful for characterizing IDPs, either containing 
structured domains, or in complex with a structured partner, 
or containing residual structures described by high-resolution 
techniques, such as NMR. 

5.1. Overall Shape of a Protein or Complex 

The development of new programs that restore the enve- 
lope of a scattering object from its scattering curve ab initio 
triggered the expansion of SAXS in structural biology a dec- 
ade ago. Several programs that use different algorithms and 
apply different restraints for the calculation to converge 
faster are now available. All of these programs calculate the 
overall external shape of a protein by filling the volume of a 
bead model with beads of variable size and number. 
DAILA GA, for example, uses a genetic algorithm [67], 
whereas SAXS3D uses a Monte Carlo-type reconstruction 
algorithm [68], and the program suite DAMMIN/ DAM- 
MIF/MONSA/GASBOR uses a simulated annealing proce- 
dure [69, 70]. A comparison of these programs reveals that 
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they are all able to properly retrieve the overall shape of a 
well-folded protein with a similar quality of fit at high reso- 
lution [71]. A considerable effort has been made to signifi- 
cantly decrease the calculation time in particular for DAM- 
MIF with respect to DAMMIN [72]. Restraints applied in 
DAMMIN/DAMMIF [69] aim to minimize the interfacial 
area between the protein envelope and the solvent, imposing 
compactness and connectivity constraints, which may not be 
appropriate for proteins with a significant amount of intrinsic 
structural disorder. DAMMIN/DAMMIF also fits the data up 
to a resolution of 25 A (q~0.25 A"') [73]. GASBOR [70] 
uses the entire scattering curve up to a resolution of ~10 A to 
generate a bead model in which each bead corresponds to a 
dummy residue (spheres of 3.8 A diameter), the number of 
residues are equal to the number in the protein (with an up- 
per limit of -1800 dummy residues), and nearest-neighbor 
distribution constraints are applied. MONSA allows describ- 
ing complex objects composed of several domains of differ- 
ent electronic densities or of different scattering lengths and 
is therefore very useful for protein/DNA or protein/RNA 
complexes, for example, or for small angle neutron scatter- 
ing (SANS). 

Trying to retrieve the overall shape of a highly dynamic 
macromolecule such as an intrinsically disordered protein 
may at first glance appear meaningless. This shape provides 
at least a visual insight and confirms the parameters (Rg, 
Dmax, contour and persistence lengths) inferred from the scat- 
tering curve already provided numerically, especially for 
entirely disordered proteins. The primary interest of the 
shape calculation is actually for objects containing both 
globular and disordered regions. This is the case for pluri- 
modular proteins, in which linkers, or long disordered re- 
gions, tether globular domains as well as for complexes be- 
tween an IDP and a globular folded partner. This strategy 
often allows one to locate the respective position of each 
globular domain whose atomic structure was already known 
either by X-ray diffraction, NMR or molecular modeling. 
Information on the compactness or the degree of disorder of 
the linker or predicted disordered region in between can then 
be inferred from possible protruding regions of the shape or 
from dimensions inside the complex between the different 
folded domains. 

Because of the inherent dynamics of these objects, the 
calculated shape is only a rough average of the global struc- 
ture of the object in solution [45]. Interestingly, whereas for 
a globular rigid protein, the ab initio shape restoration is 
usually robust upon numerous calculation runs, the shape 
reconstructions of an IDP or of a highly flexible region may 
vary dramatically from run to run [47, 74-76]. Therefore, for 
a highly flexible object, after repeating the calculations of 
the restored shape to check the reproducibility of the yielded 
solution, it is essential to display the most typical shape 
among those obtained by each calculation. Averaging all 
these shapes would smooth all significant and infomiative 
features of the shapes, which slightly differ in size and loca- 
tion from one shape to another, and all of the relevant and 
significant information provided by a single shape would be 
lost. However, the program DAMAVER is extremely useful 
for rigorously selecting the most typical reconstruction. This 
progreim aligns the different shapes and calculates a normal- 



ized spatial discrepancy (NSD) between them. An NSD 
value below 0.7 for DAMMIN reconstructions and below 1.1 
for GASBOR reconstructions indicates that the solution is 
stable. Significant outliers are discarded by the program, and 
the reconstruction with the lowest NSD is selected [66]. 

The use of shape calculation for proteins or complexes 
containing disordered regions can be illustrated by the study 
of the formation of cellulosomes, that constitute extremely 
active multienzymatic cellulo lytic complexes [77]. The 
global shape of a complex along with the distances measured 
between the folded subdomains upon assembly of a minicel- 
lulosome revealed an unexpected compaction of the linker 
separating the cellulase domain and the dockerin domain 
upon binding of the dockerin domain to the cohesin domain. 
These data revealed a novel mechanism of remote induced 
folding of a disordered region several Angstroms from the 
binding site [77]. 

Another interesting aspect of shape calculation is when 
the crystal structure of the complex between the folded part- 
ner and the molecular recognition element of the IDP has 
been solved. Detemiining the overall shape of the complex is 
then tremendously useful for investigating the putative struc- 
ture of the region of the IDP not directly involved in the 
binding interface. A pioneering example was provided by the 
SAXS structure of the complex between the full-length mea- 
sles virus Ntail of the nucleoprotein in complex with the X 
domain (XD) of the phosphoprotein [78]. A reproducible and 
very recognizable bulky part of the ab initio restored shape 
of the complex could accommodate the atomic structure of 
XD associated with the 20-residue long alpha-MORE of 
Ntail- The rest of the shape was highly variable from one 
run to another but always exhibited a long protuberance with 
varying bends and cross-sections. These data revealed that 
the 90 N-terminal residues of the protein remained disor- 
dered upon binding to XD (Fig. (3)). Another example is 
provided by the complex between the disordered transla- 
tional repressor eIF4E binding protein 4E-BP and the initia- 
tion factor eIF4E [76]. Determining the shape of the isolated 
proteins and of the two proteins in complex revealed that 4E- 
BP wraps around eIF4E to form a fuzzy complex. These 
structures shed light on the mechanisms of regulation of 
eIF4E by the disordered 4E-BP, which involves other re- 
gions of the protein that were already suspected based on 
former NMR studies (Fig. (4)). The overall shape of the ter- 
nary complex composed of the full-length intrinsically disor- 
dered p27, the cyclin dependent kinase cdk2 and cyclin A 
could also provide insights into the mechanisms of inhibition 
of Cdk2/ cyclin A by p27 to limit cell proliferation [79]. The 
low-resolution envelope of the ternary complex obtained 
using SAXS displayed a large roughly spherical bulge on 
which the crystal structure of the complex composed of the 
N-terminal KID domain of p27, Cdk2 and cyclin A could be 
superimposed and a protruding elongated region that could 
accommodate an ensemble of models of C-terminal p27 ob- 
tained by molecular dynamics simulations. This structural 
organization indicated that the C-terminus of p27 remains 
highly flexible and is able to fold back onto the active site of 
Cdk2, where it could be phosphorylated and trigger a signal- 
ing cascade for degradation and cell division. 
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Fig. (3). 3D model of the complex Ntail-XD of the measles virus, (a) The envelope of the complex calculated by GASBOR exhibits a 
bulge that was recurrent from run-to-run calculations and can accommodate the crystal structure of XD (blue) and the alpha-MoRE of Nxail 
(red). The elongated region of the envelope was more variable in shape upon several calculations with GASBOR. (b) The scattering curve of 
the envelope calculated with GASBOR (red curve) perfectly fits the experimental scattering curve of the complex (black curve), (c) A mo- 
lecular model of the full-length complex was also obtained from the SAXS data using CREDO, which reconstructed the missing disordered 
region of the crystal structure of the complex (Figure adapted from [78]). 
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Fig. (4). Shape calculation and 3D model of the complex eIF4E bound to 4E-BP. (a) The crystal structure of 4E-BP (cyan) is perfectly 
superimposable on the envelope of the free protein in solution calculated with DAMMIN from the scattering data, with a small bulge corre- 
sponding to the disordered N-temiinus of 4E-BP. (b) X-ray crystallization showed that a short region of eIF4E (red) visible in the electronic 
density undergoes an induced folding into an alpha-helix upon binding to 4E-BP. NMR studies identified other residues of 4E-BP involved in 
the interaction (dark blue) on the opposite side of the protein where the alpha-helix of eIF4E binds 4E-BP, suggesting that eIF4E (yellow) 
wraps around 4E-BP but retains enough flexibility to not be seen by X-ray crystallography, (c) The envelope of the complex calculated with 
DAMMIN exhibits an upper-half region identical to the shape of 4E-BP alone and an elongated region on one side of 4E-BP, probably corre- 
sponding to the disordered portion of eIF4E. This envelope, together with data from X-ray crystallography and NMR, allowed the authors to 
propose a model of the complex with a rather well defined region in the close vicinity of 4E-BP and a loose region corresponding to the rest 
of eIF4E that remains mostly disordered in the complex (yellow). (Figure adapted from [76]). 
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5.2. Conformation of Disordered Regions within Proteins 
or Complexes 

The very few previous examples selected from the grow- 
ing number of such studies in the literature show to what 
extent a low-resolution 3D envelope may be sufficient to 
provide crucial information on the overall organization of 
IDPs in a complex. Nevertheless, the most recent advances 
in SAXS make it possible to go a step forward and to infer a 
molecular model of an isolated protein containing long dis- 
ordered regions, such as multidomain proteins, or of a com- 
plex involving an IDP. These 3D-models can then provide 
essential clues on the mechanisms of molecular recognition 
of IDPs, especially when these models are obtained by com- 
bining results from other biophysical and structural methods 
that provide high-resolution information. 

Several programs have been developed to restore the 
conformation of polypeptide chains in the disordered region 
of a protein amidst the more structured regions for which the 
atomic coordinates are known. The program BUNCH [80] 
combines a rigid body with an ab initio modeling approach. 
The folded domains with known structure are considered as 
rigid bodies, whereas the unstructured regions are modeled 
by chains of dummy residues. Their optimal positions and 
orientations are then calculated using a simulated annealing 
algorithm to fit the scattering data with restraints minimizing 
steric clashes and discontinuities in the chains. BUNCH is 
particularly well adapted to multidomain proteins with disor- 
dered regions between the globular domains. An extension of 
BUNCH, CORAL (COmplexes with RAndom Loops), is 
now available, which performs the same modeling but for 
complexes composed of several partners. If known, distance 
restraints between residues, such as the interacting residues, 
can be added. As for all the other programs developed by 
Svergun's group, BUNCH and CORAL are easily available 
at http ://www. embl-hamburg.de/biosaxs/. 

The program DADIMODO has also been developed for 
proteins or complexes containing both structured and disor- 
dered domains [81]. It is based on a genetic algorithm and 
has been designed to combine SAXS and NMR data. Dis- 
tance restraints, such as those provided by chemical shift 
mapping, and orientational restraints, provided by RDCs, 
may be added to the algorithm [82]. Unlike BUNCH, the 
program can also deal with very extended particles because it 
is not limited by the size of the complex or the number of 
harmonics (see below). Another advantage of this program is 
that it builds models using real amino acids and can therefore 
apply an energy minimization on the selected conformations. 
Finally, because it is open source, it is possible to insert po- 
tentials from other methods based on the user's needs. Thus, 
by combining SAXS and NMR data using DADIMODO, 
Aliprandi et al. could describe the spatial organization and 
interactions between different subdomains of the ribosomal 
protein SI, thus shedding light on the structural events oc- 
curring during RNA binding [83]. 

The extraordinary interest in these approaches can be best 
exemplified by the quaternary structure of full-length p53, 
which was determined using SAXS coupled with the crystal 
structures of the core and tetramerization domains. The ma- 
jor tumor suppressor p53 is made of a disordered N-terminal 
transactivation domain (TAD), a disordered C-terminal regu- 



latory domain, and two folded domains: a tetramerization 
domain separated by a linker from the core domain and a 
DNA binding domain, whose structures have been solved. 
Using BUNCH, a representative structure of full-length p53 
could be reconstructed by modeling the backbone of all the 
unstructured regions absent in the crystal structures. Drastic 
conformational changes in the ternary and quaternary struc- 
ture of p53 were observed between the full-length protein 
free in solution and the DNA-bound protein [84] (Fig. (5)) 
and even in a ternary complex involving DNA and the Taz2 
domain of p300 bound to p53-TAD [85]. These unprece- 
dented observations paved the way to a novel understanding 
of the mode of action of p53. 

This approach might be considered somewhat restrictive 
because a single conformation cannot provide a comprehen- 
sive view of the ensemble of conformations that is explored 
by the object if it is flexible. Nevertheless, a single confor- 
mation may represent their conformational properties ex- 
tremely well. For example, the structure of the full-length 
cellulase Cel48F, composed of a catalytic module tethered 
through a linker to a small dockerin domain, was modeled 
using CREDO [86], a precursor program of the more elabo- 
rate BUNCH. This program aimed to model missing regions 
in proteins whose crystal structures were incomplete because 
of flexible or disordered regions. Starting from the crystal 
structure of the catalytic domain, the program modeled the 
structure of the linker and of the dockerin region from the 
experimental scattering curve of the entire protein [77]. Sev- 
eral independent runs led to models that all exhibited a 
stretched region consistent with the number of residues of 
the linker following the catalytic domain and a small folded 
globular region, which was remarkably superimposable with 
the NMR structure of a homologous dockerin domain. These 
models differed from each other only by the orientation of 
the stretched region, suggesting fluctuating conformations of 
the linker region. In addition, all these models perfectly rep- 
resented the experimental scattering curve. 

Modeling the conformation of the disordered region can 
be useful even when the rigid domains are very short. Mod- 
els of the 70-residue long disordered salivary protein IBS 
were constructed using BUNCH with only three short seg- 
ments of polyproline repeats in the sequence modeled as 
rigid bodies. Each model, which was generated by 20 inde- 
pendent runs of BUNCH, displayed different conformations 
but with recurrent features (for example, extended conforma- 
tions with large loops), and all were perfectly compatible 
with the experimental data [87]. 

This approach thus provides a single but highly relevant 
conformation, which is representative of the astronomical 
number of possibilities explored by the protein. Most of the 
time, this unique conformation describing the protein and its 
disordered region(s) is entirely sufficient and extremely 
valuable to answer the initial questions regarding the struc- 
tural organization, the possible internal or external interac- 
tions with other domains or ligands, and the putative coordi- 
nation or synergy of the different domains within the full- 
length protein or in complex. 

5.3. Comparison with High-Resolution Structural Models 

While low-resolution models can be built using the mod- 
eling approaches described above, high-resolution structures 
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Fig. (5). 3D models of p53. SAXS models of free (a) and DNA-bound (b) p53 in solution from rigid body analysis with the addition of miss- 
ing fragments by BUNCH. Both models are shown in two orthogonal orientations. Core (green and blue) and tetramerization (red) domains 
are shown as cartoon representations, with core domains binding to the half-site in the same color. Flexible connecting linkers (gray), N ter- 
mini (pink), and C termini (yellow) are shown as semitransparent space-filled models. Models of the flexible regions are approximations to 
illustrate their global structural properties rather than representing defined conformations. (Figure taken from [151]). 



or models of the protein or complex may be available or may 
have been built based on high-resolution data. Unlike many 
other biophysical techniques, one of the utmost advantages 
of SAXS lies in the possibility to calculate the theoretical 
scattering curve of a structural inodel and to compare it di- 
rectly to the experimental data. 

Several programs have been developed, each of them 
essentially varying in their description of the hydration shell 
surrounding the protein, which affects the quality of the fit- 
ting at large q values. Up to q ~0.3 A"', the different methods 
generally yield similar results. CRYSOL, developed by 
Svergun, has been the only available prograin for over a dec- 
ade, and is still today the most widely used program. 
CRYSOL is moreover extremely fast and user friendly [88]. 
CRYSOL calculates the spherically averaged scattering 
curve with spherical harmonic multipole expansions. How- 
ever, CRYSOL is limited by the number of spherical har- 
monics (maximum 50) and thereby by the size of the object 
of study. Therefore, when using CRYSOL for a protein with 
a large maximum dimension, which is typical for IDPs, the 
number of harmonics should be fixed at the maxiinum value, 
or another method should be considered. CRYSOL also as- 
sumes an implicit hydration layer of constant (but adjustable) 
density and fixed thickness. With improvements in instru- 
mentation and the higher resolution now attained in the ex- 



perimental scattering curves, most of the newly developed 
programs now consider explicit solvent in their calculation, 
leading to better fitting results, especially in the wide angle 
regime. This is often accompanied by a higher cost in calcu- 
lation tiine, such as described for AXES (webserver: 
http://spin.niddk.nih.gov/bax/nmrserver/saxsl/) [89], and 
which depends on the algorithm selected to speed up calcula- 
tions. While SASSIM uses multipole expansion [90], 
ORNL SAS uses a Monte Carlo method (available at: 
http://www.oml.gov/sci/csd/Research_areas/MS_csmb_com 
p_methods.htm) [91]. FoXS (webserver: http://modbase. 
compbio.ucsf.edu/foxs/about.html) uses the Debye formula 
to calculate intensities from atomic factors to which it adds a 
tenn that represents the displaced solvent and another term 
proportional to the solvent accessible surface to generate the 
contribution of the hydration water [92]. Other methods 
cleverly use a coarse-grain approach, taking advantage of the 
low-resolution of SAXS and significantly decreasing the 
coinputation tiine. Among thein, the prograin Fast-SAXS 
(available at: http://thallium.bsd.uchicago.edu/ Roux- 
Lab/saxs.html) [93,94] proposes a more realistic description 
of the water shell based entirely on the atomistic description 
of water using molecular dynamics simulations. Another 
interesting approach has been proposed by Poitevin et al. 
[95] with the program AquaSAXS (webserver: 
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http://lorentz.immstr.pasteur.fr/aquasaxs.php) in which the 
AquaSol method [96] is used to describe the hydration water 
as an assembly of self-orienting dipoles of variable density 
on a grid instead of a continuous dielectric medium. Finally, 
Stovgaard et al. [97] successfully reproduced protein scatter- 
ing profiles even in the wide angle regime using coarse 
grained protein models and the Debye formula. However, 
their method did not describe the hydration layer sun^ound- 
ing the protein. 

In the case of an IDP, it is still preferable to explicitly 
describe the water molecules surrounding the protein when 
comparing its theoretical scattering curve to the experimental 
data. Comparing the results obtained using several of these 
methods should confirm the best strategy to enhance the 
quality of the fit and to help refme and validate the atomic 
models, keeping in mind that interpreting the results of only 
SAXS data at the atomic level remains meaningless, consid- 
ering the low-resolution of SAXS. 

These programs, which calculate the scattering curve 
from atomic models, may be particularly useful to confront 
experimental or modeled atomic structures to the structure in 
solution observed in SAXS. This was the case for the nuclear 
transcriptional activator protein TAT of the human immu- 
nodeficiency virus (HIV). TAT is an intrinsically disordered 
protein of -100 residues and has long been at the center of 
antiviral therapeutic strategies because of its central role in 
viral replication. TAT is also a promising candidate antigen 
for anti-HIV vaccination. Several highly controversial struc- 
tures of TAT from different strains of the virus have been 
solved by NMR, and the atomic coordinates have been de- 
posited in the Protein Data Bank (PDB) [98-100]. On the 
other hand, a more recent thorough NMR study showed that 
the protein was highly disordered with no detectable residual 
structure or structural restraints and with characteristics simi- 
lar to a random coil [101]. Using SAXS, it has been possible 
to test the validity of the structures deposited in the PDB by 



comparing their corresponding scattering profile to synchro- 
tron scattering data [102]. These NMR structures were quite 
inconsistent compared with the experimental scattering pro- 
file (Fig. (6)) and with the dimensions inferred from the 
SAXS curve (Rg, D^ax). Conversely, the SAXS data con- 
firmed the study of Shojania and O'Neil [101] by showing 
that TAT was a disordered random coil [102]. 

Likewise, these programs can help build and validate 
models, as for the complex of the small intrinsically disor- 
dered thymosin-(34, which folds upon binding to G-actin and 
thus sequesters G-actin and regulates filament assembly 
[103]. Crystal structures of monomeric G-actin in the pres- 
ence of inhibitors of polymerization could be obtained only 
with the N-terminal or C-terminal half of thymosin-(34. 
These crystal structures were combined to construct an 
atomic model of G-actin in complex with full-length thy- 
mosin-(34. The theoretical scattering profile of this model 
calculated using CRYSOL perfectly fit the experimental 
SAXS curve of this complex [103], supporting the validity of 
this model and the functional interpretation inferred from it. 

Finally, as we will see below, these programs need to be 
used to calculate the scattering curves of the numerous 
atomic models that constitute the structural ensembles aim- 
ing to describe the distribution of conformations sampled by 
the protein in solution observed by SAXS. 

5.4. Distribution of Conformations 

Intrinsically disordered proteins are clearly highly dy- 
namic and do not exist as a single conformation in solution, 
but as interconverting conformers. Even when bound to a 
partner, they can still remain highly fluctuating, including on 
the interaction site, leading to what Tompa and Fuxreiter call 
fuzzy complexes [104]. 

Most spectroscopic techniques monitor the average sig- 
nal arising from this multitude of conformations. A scatter- 
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Fig. (6). Use of CRYSOL to compare atomic structure with the structure in solution observed by SAXS. Comparison of experimental 
SAXS data from HIV-TAT (black line) with the theoretical scattering curve of published structures of TAT with pdb code ITAC (red dotted 
line), ITIV (blue dashed line), IFJW (grey dash-point line), and 1K5K (continue green line)] using the CRYSOL program to show the dis- 
crepancy between the pdb structures and the structure in solution observed by SAXS. The low statistics of the experimental curve are ac- 
coimted for by the low concentration of the protein (< 1 mg/mL) (Figure adapted from [102]). 
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ing pattern also contains the contribution of all these differ- 
ent conformations existing in solution. The approaches using 
SAXS described above often retrieve a single shape or con- 
formation of the protein from this scattering pattern and do 
not describe the ensemble of the conformations. In some 
cases, it is even impossible to describe the scattering pattern 
with a single conformation. It is particularly striking when 
one examines the distance distribution functions of large, 
folded domains tethered by a long, flexible linker, as for the 
PCNA-Msh6-Msh2 complex (Fig. (7)) [74] or for the chi- 
meric double cellulase Cel6AB, which was a pioneering case 
in which the distribution of conformations of an intrinsically 
disordered region was estimated using SAXS [105]. In such 
cases, it is interesting to find the right distribution of con- 
formations that agrees with the experimental data. Further- 
more, when structural and dynamic information from other 
complementary techniques are available and can be com- 
bined with SAXS data, it may be worth trying to gain further 
insights into the dynamics of the flexible regions of the IDP 
and to establish the ensemble of populations existing in solu- 
tion, particularly in the case of multidomain proteins contain- 
ing disordered regions, whatever their length. Deciphering 
the ensemble of conformers that the protein can reach is cru- 
cial as this would allow a comprehensive understanding of 
the energy landscape explored by the disordered proteins and 
possible insights on some of the conformers that signifi- 
cantly differ from the average or more stable conformation 
but that may play a critical role in the function of the protein. 

Retrieving the ensemble of conformations adopted by a 
protein from experimental data is quite challenging. The 
number of degrees of freedom is very large compared to the 
constraints provided by the experimental data, which inexo- 
rably leads to a degenerate solution. As a consequence, it is 
not possible to obtain a unique solution, and on the contrary, 
many different ensembles may be consistent with the data. 
This is even more acute for SAXS, which is already an un- 
derdetermined technique. Overfitting the data is thereby a 
serious pitfall that one must try to avoid by all possible 
means [106]. 



Several experimental approaches, including NMR (PREs, 
RDCs, NOEs), FRET, and SAXS, have been used to build 
ensembles of structures that describe the dynamic properties 
of IDPs [107, 108]. Reviewing all these methods is beyond 
the scope of the present review, and we will focus here only 
on those techniques that use SAXS data, either exclusively 
or in combination with other experimental measurements. 

The strategy to establish a distribution of conformations 
in accordance with experimental SAXS data can be de- 
scribed by a general scheme in three main steps, each step 
having its own specific difficulties: (i) generating a compre- 
hensive library of conformers, (ii) calculating the theoretical 
scattering profile of each of these conformers, and (iii) se- 
lecting a subset of these conformers whose scattering curve 
of the ensemble best fits the data. 

The first step is not specific to SAXS. In particular, the 
recent advances in NMR, including RDCs and PREs, have 
urged the development of programs that generate wide pool 
of structures to reproduce biophysical data obtained on 
highly dynamic macromolecules [106, 108]. The main diffi- 
culty here is to generate a broad enough pool of conformers 
in a reasonable computing time. Molecular dynamics (MD) 
generates numerous atomic structures along a trajectory with 
adequate force fields. However, depending on the size of the 
protein, MD may not sample a sufficiently wide library of 
conformers in a reasonable computing time, considering the 
large conformational space explored by intrinsically disor- 
dered proteins. Several strategies have therefore been used to 
circumvent this difficulty. A typical workaround is found in 
the program Flexible-Meccano, which generates coarse- 
grained realistic atomic models using a Monte-Carlo tech- 
nique and applies backbone dihedral angles allowed in the 
Ramachandran space [109]. Approaches using this program 
have been successfully applied to many IDPs to reproduce 
SAXS or RDC data [1 10-1 14]. 

The difficulties of the second step concern the accuracy 
of calculating a theoretical scattering curve of models by 
estimating the correct contribution of the hydration layer, as 
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Fig. (7). Comparison of experimental and calculated distance distribution functions of flexible multidomain proteins. P(r) functions 
calculated for four randomly generated models of Msh2-Msh6 linked to PCNA via random peptides with different interdomain distances 
reveals that no single conformer can account for the observed P(r) curve of the Msh2-Msh6-PCNA complex. The red curve with the long tail 
corresponds to the experimental P(r) curve of Msh2-Msh6-PCNA complex. (Figure adapted from [74]). 
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discussed in the paragraph above. The third and most critical 
step faces the redundancy of the possible solutions in select- 
ing several structures whose average signal fits the data, and 
safeguards have to be defined to prevent overfitting. 

Several program suites have been developed that inte- 
grate the three above-described steps to generate structural 
ensembles compatible with the SAXS data. These programs 
adopt different strategies at each step, especially concerning 
the choices made to minimize overfitting. 

The program suite EOM, Ensemble Optimization Method 
[115], is currently the most popular due to its simple inter- 
face. It is widely used when an IDP is examined using 
SAXS. The first step of the procedure is performed by the 
program RanCh (Random Chain), which builds a pool of 
random models of IDPs or multidomain proteins with linkers 
from the sequence of the full-length protein and the atomic 
coordinates of the folded domains (if any). The disordered 
regions are modeled with Ca chains using a quasi- 
Ramachandran plot. The authors recommend generating 
10,000 structures. The theoretical scattering profile of all 
these structures is then calculated using CRYSOL. The third 
program, Gajoe, (Genetic Algorithm Judging Optimization 
of Ensembles) uses a genetic algorithm to select an ensemble 
of scattering curves (and thereby of structures) whose aver- 
age fits the data. Typically, several dozens of individual scat- 
tering curves are selected. The results are presented as a his- 
togram of the radii of gyration and of the Umax of the selected 
protein models compared to the distribution of Rg and of 
Dmax of the initial random pool. Instead of using RanCh, the 
user can start from a pool of conformers generated by any 
other method, thereby adding restraints arising from other 
experimental results, such as PREs. The pdb files of the 
models of the ensemble that give the best fit are also pro- 
vided. These models do not necessarily represent the struc- 
tures adopted by the protein but are just models whose aver- 
age calculated scattering curves best fit the data. In the case 
of an entirely disordered protein, EOM provides an estimate 
of the conformational landscape and of the shift in the di- 
mensions of the ensemble of conformations reached by the 
protein with respect to those of a random coil, similar to 
when one compares the dimensions and structural parameters 
inferred from the experimental curves (Rg, Dn,ax, statistical 
and contour lengths) to those of a random coil. EOM thus 
provides an alternate way to reveal structural restraints along 
the polypeptide chain at the global scale. In the case of 
multidomain proteins, EOM can provide information on the 
flexibility of the interdomain linkers, comparing them to a 
random distribution or with variant linkers [116, 117] and is 
particularly productive when coupled with NMR data [45]. 
The fluctuations of the linkers in the multidomain ribosomal 
L12 protein were thus investigated, and an ensemble model 
of the structure and reorientational dynamics of the protein 
were obtained by reconciling SAXS data with NMR relaxa- 
tion data, which enabled a detailed description of the struc- 
tural propensities of the linkers [118]. In some cases, bi- 
modal distributions may be yielded by EOM calculations 
[119-121] and may provide interesting insights into a possi- 
ble equilibrium between different preferred populations. A 
prudent interpretation of the results at the functional level is 
recommended here, and such results would highly benefit 



from being consolidated by other biophysical techniques, 
such as FRET for example. 

The recent program Broad Ensemble Generator with Re- 
weighting (BEGR), initially developed to interpret NMR 
chemical shifts of proteins, appears promising because it 
generates realistic structural ensembles in a broader confor- 
mational space, by applying only steric constraints. The 
probability {i.e., weight) for each structure in the pool is de- 
termined such that the average simulated spectrum best fits 
the experimental spectrum using a Metropolis Monte Carlo 
approach [122]. 

The program suite SASSIE [123] has been originally 
written to generate a set of structures for the HIV Gag pro- 
tein consistent with SANS data and neutron reflectivity data. 
SASSIE is executed from within the Visual Molecular Dy- 
namics (VMD) program [124] and utilizes molecular dynam- 
ics with CHARMM force-fields to generate large ensembles 
of structures by randomly varying backbone dihedral angles 
with energetically allowable values. Distance constraints 
such as those provided by NOEs or other techniques may be 
applied when generating these structures. Each structure is 
then energy minimized using the program NAMD [125]. The 
theoretical scattering curve of these structures is calculated 
using CRYSON [126] or Xtal2Sas [127] for SANS, and 
CRYSOL [88] for SAXS. The scattering curves are then 
analyzed and compared to the experimental profile through 
the 'I (discrepancy between the theoretical and the experi- 
mental profiles) and the radius of gyration. No particular 
weighting scheme is applied, so that a single structure or a 
linear combination of several structures may be selected as 
the best representative structures reproducing the experimen- 
tal data. 

Several other approaches have recently been developed 
with a different philosophy. All of the following approaches 
try to prevent overfitting of the data by selecting an ensem- 
ble of the minimal size that best fits the data. Most of them 
also use further restraints or strategies to strengthen and as- 
sess the validity and robustness of the solution. 

The program Minimal Ensemble Search, MES (freely 
available for academic use at http://bll231.als.lbl.gov/ 
saxs_protocols/mes.php) aims to determine the minimal en- 
semble that best fits the data [27]. A range of random struc- 
tures is generated by the program BILBOMD, which com- 
bines MD at a high temperature to avoid local minimum 
trapping with rigid body modeling of the globular domains. 
Distance constraints can also be added. This approach shares 
some similarities with the constraints solution scattering 
modeling method developed by Stephen Perkins [128, 129], 
apart from the fact that the latter tends to select only one 
best-fit conformer whose atomic coordinates are then depos- 
ited in the protein Data Bank. The theoretical scattering 
curves of the models yielded by BILBOMD are then com- 
puted using CRYSOL, and the selection of the minimal en- 
semble is performed by a Monte Carlo genetic algorithm. 
Restraints to limit overfitting are provided by realistic con- 
formational models explored by MD and by the selection of 
a minimum of structures that deconvolute the SAXS data, 
usually two to five weighted conformations. The flexibility 
of the protein is assessed by comparing the root mean square 
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deviation (rmsd), Rg and Dmax between the selected models 
and the best-fit model. 

The Basis-Set Supported SAXS (BSS-SAXS) reconstruc- 
tion is also an integrative approach [28]. Simulations using 
coarse-grained molecular dynamics with different initial 
conditions and incorporating constraints, such as interactions 
between different domains of the proteins, adjusted to match 
the Kd are first performed to ensure a proper sampling of the 
accessible conformational space. The theoretical scattering 
curves of the different models are calculated using Fast- 
SAXS, as described above. A Bayesian-based Monte-Carlo 
procedure yields fractional populations with more accurate 
statistics. The originality of the approach also lies in the se- 
lection not of a limited number of discrete conformers, but of 
a limited number of representative families of states that 
actually cluster around a large ensemble of configurations 
(Fig. (8)). This approach elucidated the assembly conforma- 
tional states of the multidomain protein Hck, from the family 
of Src-kinases. Importantly, it also revealed the dynamic 
equilibrium between several closed inactive and open active 
conformations regulated by the interaction forces between 
the different domains (SH3, SH2 and linkers) and how this 
equilibrium is perturbed and shifted towards a family of con- 
formations upon binding to different signaling peptides. 
These results provided a critical understanding of the 
mechanisms of regulation of this family of kinases, which is 
involved in many vital and cancer-related signaling pathways 
[28, 130]. 

Another Ensemble-Refinement of the SAXS (EROS) 
method has recently been developed to determine the dy- 
namic conformational properties of biomolecular assemblies 
containing intrinsically disordered segments [30]. An initial 
ensemble of conformations is first generated using coarse- 
grained models, which are then elegantly refined by an en- 
ergy function optimized for protein binding in which the 
interactions between domains are treated at the residue level 
with appropriate energy potentials, such as electrostatic po- 
tentials or hydrophobic interactions. The theoretical scatter- 
ing curves of the models are calculated using an algorithm 
that is similar to CRYSOL but adapted to coarse-grained and 
not atomic models. Then a maximum entropy refinement 
selects the minimum weighted clusters of structures that are 
consistent with the SAXS data to prevent overfitting. As the 
generated models account for hydrophobic and electrostatic 
interactions, drastic conformational reorganizations upon 
increased salt concentrations of the endosome associated 
CHMP3, a key component of the ESCRT-III complex, could 
be described in detail. Electrostatic interactions between do- 
mains, which ensure an auto-inhibited compact closed con- 
formation of CHMP3, were disrupted when shielded by high 
salt concentrations, leading to the active open and extended 
conformations, with a higher flexibility. Indeed, a minimum 
of 60 structures was required to account for the SAXS data 
of the open conformation at high salt concentrations, 
whereas an ensemble of only 6 clustered structures agreed 
with the SAXS data at low salt concentrations [30, 31]. Simi- 
larly, the equilibrium between the open and closed confor- 
mations of the heterotetramer complex ESCRT-I was deci- 
phered using the same approach, and an ensemble of six 
structures was required to fit the scattering data coupled to 
double electron-electron resonance spectroscopy of spin- 



labeled complexes and confirmed by FRET spectroscopy 
measurements [131]. 

The program ENSEMBLE was originally written to de- 
scribe the ensemble of populations of a folded and unfolded 
N-terminal SH3 domain of drk co-existing under non- 
denaturing conditions [132]. Since then, it has been further 
developed [32, 133] and has been used for several intrinsi- 
cally disordered proteins [134-136]. ENSEMBLE utilizes 
several strategies to prevent overfitting. First, the program 
can account for a high number of restraints from many dif- 
ferent experiments, such as NMR chemical shifts, NOEs, J 
coupling constants, RDCs, PREs, tryptophan indole fluores- 
cence, hydrodynamic radius, and SAXS. Second, the mini- 
mum ensemble of structures compatible with all the data is 
selected to represent the conformational space explored by 
the protein and the variety of conformations that may be at- 
tained by the protein. ENSEMBLE first employs the pro- 
gram TraDES [137] to generate a wide range of statistically 
random structures, taking into account the secondary struc- 
ture propensities if necessary. For the ensemble minimiza- 
tion procedure, experimental results from the various meth- 
ods are converted into energy values, distances restraints, or 
solvent-accessible areas, depending on the information pro- 
vided by the technique. Theoretical scattering data of the 
generated models are computed using CRYSOL. ENSEM- 
BLE can thus be used to observe significant transient struc- 
tures in the free disordered protein Sicl and to describe the 
highly dynamic interactions between the protein and the 
Cdc4 subunit of a ubiquitin ligase together with the role of 
phosphorylation of Sicl in interchangeable interactions [134, 
138]. Forman-Kay's group also investigated the structural 
propensities of several intrinsically disordered regulators of 
the protein phosphatase 1 (PPl) in the unbound state and in 
complex with the inhibitor-2 (1-2) [135]. Among the ensem- 
ble of selected structures (-10-20 structures) in the free state 
compatible with all the experimental restraints (chemical 
shifts, PREs, Rh, SAXS), transient secondary structure ele- 
ments were observed; also, preformed structural motifs simi- 
lar to those in the bound state were present with a sufficient 
stability to facilitate the interaction with PPl. This result 
thereby supports the longstanding idea that the selection of a 
prefolded conformer with pre-structured Motifs (PreSMos, 
see [139] in the same CPPS issue, for a review) could be the 
predominant model for certain interactions, besides the fold- 
ing upon binding mechanisms observed by Sugase et al. 
[140]. Finally, the ensemble of dynamic structures of the 
complex of PPl and 1-2 generated using ENSEMBLE was 
examined. Consistent with the scattering curve of the com- 
plex, these data revealed that 1-2 remains largely disordered 
upon complex formation. Nevertheless, transient contacts 
were identified that were not observed in the partial crystal 
structure of the complex, providing the first molecular in- 
sights into the function of a region of 1-2 that plays a critical 
biological role in the interaction with PPl [135]. 

Determining the ensemble of conformations of a disor- 
dered protein from SAXS data is therefore quite a challenge 
and requires clever, rigorous procedures and cross-validation 
with a maximum of complementary techniques to decrease 
the redundancy of the solution. The information yielded by 
the obtained models is however of crucial interest because it 
provides unique information on the conformational dynamics 
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Fig. (8). Representatives of the 9 families of conformational states of the Src-kinase Hck. The program BSS-SAXS first generated nu- 
merous conformations and clustered them into 9 representative families of states ranging in architecture from fully to partially assembled and 
disassembled states, in size from compact to extended forms, and in interdomain separation from fully assembled to partially disassembled 
states. The catalytic domain is in blue, the SH2 domain is in green, the SH3 domain is in yellow and the linkers are in red. Based on the ex- 
perimental scattering data of free Hck in solution, Hck exists in different open and closed conformations stabilized by intramolecular interac- 
tions involving the SH2 and the SH3 domains. The assembled conformation state 1 is the major species (83%) and is in equilibrium with 
minor states 6 and 8. In the presence of two activating peptides, the scattering data indicate that only the open conformations corresponding 
to states 5 and 6 exist in solution. (Figure taken from [28]). 



of IDPs, although not on the motional timescales, and on 
transient or local structures that may be critical for the activ- 
ity of the protein. A minimum number of representative 
models on the distribution of conformations prevents over- 
interpreting the models. The authors all discuss their results 
and agree on the fact that the selected conformers do not 
represent the only conformations nor the most stable con- 
formations attained in solution. Instead, the selected struc- 
tures are rather a snapshot of a large continuum of states. 
The global properties of the models and their recurrent fea- 
tures are captured in these representative confomiers and 
may provide information on the presence of local rigidities, 
transient structures, or accessible conformational states that 
may be functionally relevant. These refined ensembles in- 
crease the resolution of SAXS beyond a simple overall shape 



or average global conformation and can provide submolecu- 
lar detailed information even at atomic resolution provided 
that high-resolution data are available. A plethora of infor- 
mation on the biological activity of intrinsically disordered 
proteins can hence be inferred at the molecular level and can 
be extremely valuable to deciphering their role in the cell. 

6. PROTEIN INTERACTIONS AND SANS 

During the last decade, SAXS has become increasingly 
successful in the study of biological macromolecules in solu- 
tion. Because of its tremendous potential in the study of mac- 
romolecular complexes using the contrast variation method, 
small angle neutron scattering (SANS) is extremely promis- 
ing for the study of IDPs. Essentially, SANS provides the 
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same kind of information as SAXS, as described above. The 
difference lies in the fact that X-rays interact with electrons 
whereas neutrons interact with nuclei, allowing the possibil- 
ity to use isotopic labeling, especially hydrogen and deute- 
rium, which exhibit very different scattering lengths. By 
varying the D2O/H2O content of the solution, an object can 
be rendered invisible with a scattering density identical to the 
solvent, at the given D2O/H2O matching point. This match- 
ing point differs for DNA, RNA, lipids, polysaccharides, 
proteins, and perdeuterated proteins. It is therefore possible 
to focus only on part of a macromolecule or complex by 
matching the rest of the object at the corresponding D2O/ 
H2O matching point. 

To our knowledge, very few studies have utilized SANS 
to analyze IDPs. The mammalian translation elongation fac- 
tor lA (eEFlA) was shown to be significantly disordered 
using SANS [141]. Similarly, the structural conformation of 
HIV-Gag, which is composed of several globular and coil 
domains, was elucidated using SANS [142]. In these two 
examples, SANS was employed in exactly the same way as 
SAXS would have been employed. More recently, the con- 
formational changes of HIV-Gag upon binding to a small 
nucleic acid were investigated by SANS at different D2O/ 
H2O contrasts [143]. Furthermore, a recent review has illus- 
trated the combined use of NMR, SAXS, and SANS using 
the contrast variation method to build a model of the tandem 
RNA recognition motif domains (RRMl and RRM2) of the 
human splicing factor U2AF65 bound to an oligonucleotide 
in which the flexibility of the linker tethering the RRMl and 
RRM2 domains was described using a large ensemble of 
conformations consistent with the RDC data and the X-ray 
and neutron scattering curves [144]. 

A new milestone has been reached with the recent study 
of Johansen et al. [145] in which SANS with contrast match- 
ing was used to investigate the effect of macromolecular 
crowding on the conformation of an IDP. They mixed per- 
deuterated N protein of bacteriophage lambda, a small intrin- 
sically disordered protein, with the small hydrogenated bo- 
vine pancreatic trypsin inhibitor (BPTI) as a crowding agent 
at increasing concentrations up to 130 mg/mL at the match- 
ing point of 42% H2O, at which BPTI becomes invisible. 
Their results tend to indicate a compaction of the disordered 
protein at relatively low macromolecular crowding, as was 
observed for random coil polymers in crowded conditions 
[146, 147], but this effect apparently does not increase in 
denser crowding conditions. This study provides crucial an- 
swers to the recurrent questions about the behavior of IDPs 
in the crowded cell, which is likely to be completely differ- 
ent than in the relatively dilute test tube. Furthermore, the 
use of contrast variation SANS between two different pro- 
teins involving an IDP promises new exciting advances in 
the characterization of conformational changes in disordered 
proteins occuning between the free and bound states with an 
unlabeled protein partner. 

7. CONCLUSION 

IDPs are particularly recalcitrant in structural studies, 
which has long hampered their structural characterization. 
SAXS and SANS are now widely recognized as indispensa- 
ble tools for analyzing these proteins. Since the introduction 



of the protein trinity concept [148,149], IDPs have been as- 
sociated with the random coil state of denatured proteins and 
were mostly considered as devoid of any significant struc- 
tural features. Lessons from recent SAXS studies, especially 
those coupled with NMR data, revealed that random coil-like 
IDPs are significantly different than the random coil of the 
denatured state. Upon denaturation, all of the interaction 
potentials between the residues of the polypeptide chain, 
which stabilize the scaffold of the native protein, are strongly 
altered or screened because of the denaturing conditions, 
leading to unfolding and a random coil set of conformations. 
In the case of intrinsically disordered proteins, the residues 
specificities in the sequence are not screened by any denatur- 
ing condition. Although their strong sequence bias prevents 
them from folding onto a hydrophobic core and causes them 
to maximally expose the polypeptide chain to the solvent 
[150], residual local and even long-range weak interactions 
may still occur. These transient or sometimes more stable 
elements of structure are likely to be the ones that play cru- 
cial roles in the function of IDPs, such as in the recognition 
process. The most recent advances in SAXS and SANS in 
the instrumentation, methodology, and computational analy- 
sis of the data enable one to extract all or almost all of the 
information content of the scattering profile of an IDP likely 
to describe these important features. Whatever the degree of 
disorder of an IDP, from fully disordered proteins to multi- 
domain proteins with only short disordered segments linking 
globular domains, SAXS can describe the conformational 
space explored by the protein, decipher the functional struc- 
tural organization of multidomain complexes, detect subtle 
rigidities important for the function along the polypeptide 
chain, and provide a wealth of other information on the 
structural properties of IDPs. Finally, the immense capacity 
of SAXS to decipher the structural features of IDPs, with or 
without globular domains, is fully attained when it is used 
with other biophysical and computational techniques, which 
allow one to access a plethora of information and thereby to 
describe the protein as a whole across structural scales. With 
the possibility offered by these integrated approaches at low 
and high resolutions to examine an ensemble of generated 
structures that reconcile all of the experimental data, the ex- 
istence of minority conformers that may be critical for the 
function of the protein, preformed structural elements likely 
to be binding motifs, or any structural features crucial for the 
biological activity might hence be observed. The examples 
described above illustrate how crucial and fundamental ques- 
tions on the function of IDPs can be addressed and reveal 
insights into their unique structure-function relationships. 
The present state of the art of the SAXS approaches together 
with future developments in SAXS aiming to integrate in- 
formation from more and more complementary techniques, 
as well as in SANS with the first examples discussed above, 
open new avenues towards a comprehensive understanding 
of the structural and biological activity of these proteins with 
particularly unique properties. 
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ABBREVIATIONS 



CD = Circular Dichroism 

DLS = Dynamic Light Scattering 

FRET = Forster/Fluorescence Resonance Energy 
Transfer 

IDP = Intrinsically Disordered Proteins 

IDR = Intrinsically Disordered Region 

MD = Molecular Dynamics 

NMR = Nuclear Magnetic Resonance 

NOE = Nuclear Overhauser Effect 

PFG-NMR = Pulse-Field Gradient NMR 

PRE = Paramagnetic Relaxation Enhancement 

RDC = Residual Dipolar Coupling 

rmsd = root mean square deviation 

SAS = Small Angle Scattering of X-rays or neu- 

trons 

SAXS = Small Angle X-ray Scattering 

SANS = Small Angle Neutron Scattering 
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